Internal and external tagsets in part-of-speech tagging

نویسنده

  • Thorsten Brants
چکیده

We present an approach to statistical partof-speech tagging that uses two di erent tagsets, one for its internal and one for its external representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal tagset can be modi ed and optimized to increase tagging accuracy (with respect to the external tagset). We evaluate this approach in an experiment and show that it performs signi cantly better than approaches using only one tagset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Distributional Properties of Tagsets

We investigate which distributional properties should be present in a tagset by examining different mappings of various current part-ofspeech tagsets, looking at English, German, and Italian corpora. Given the importance of distributional information, we present a simple model for evaluating how a tagset mapping captures distribution, specifically by utilizing a notion of frames to capture the ...

متن کامل

Tagset Mapping and Statistical Training Data Cleaning-up

The paper describes a general method (as well as its implementation and evaluation) for deriving mapping systems for different tagsets available in existing training corpora (gold standards) for a specific language. For each pair of corpora (tagged with different tagsets), one such mapping system is derived. This mapping system is then used to improve the tagging of each of the two corpora with...

متن کامل

POS für(s) FOLK - Part of Speech Tagging des Forschungs- und Lehrkorpus Gesprochenes Deutsch

1 Einleitung Im Rahmen des FOLK-Projekts (Forschungsund Lehrkorpus Gesprochenes Deutsch), das am Institut für Deutsche Sprache (IDS) ein großes wissenschaftsöffentliches Gesprächskorpus aufbaut, soll mit Hilfe des TreeTaggers (SCHMID 1995) und des Stuttgart-TübingenTagsets (STTS), (SCHILLER ET AL. 1999) ein automatisiertes Part-of-Speech-Tagging (POSTagging) für Spontansprache ermöglicht werden...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Methods for Amharic Part-of-Speech Tagging

The paper describes a set of experiments involving the application of three state-ofthe-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for English, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy approach, while HMM-based and SVMbased ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997